Incremental Web Search: Tracking Changes in the Web
نویسندگان
چکیده
A large amount of new information is posted on the Web every day. Large-scale web search engines often update their index slowly and are unable to present such information in a timely manner. In this thesis, we present our solutions of searching new information from the web by tracking the changes of web documents. First, we present the algorithms and techniques useful for solving the following problems: detecting web pages that have changed, extracting changes from different versions of a web page, and evaluating the significance of web changes. We propose a two-level change detector: MetaDetector and ContentDetector. The combined detector successfully reduces network traffic by 67%. Our algorithm for extracting web changes consists of three steps: document tree construction, document tree encoding and tree matching. It has linear time complexity and extracts effectively the changed content from different versions of a web page. In order to evaluate web changes, we propose a unified ranking framework combining three metrics: popularity ranking, quality ranking and evolution ranking. Our methods can identify and deliver important new information in a timely manner. Second, we present an application using the techniques and algorithms we developed, named “Web Daily News Assistant (WebDNA): finding what’s new
منابع مشابه
Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملAnalysis of users’ query reformulation behavior in Web with regard to Wholis-tic/analytic cognitive styles, Web experience, and search task type
Background and Aim: The basic aim of the present study is to investigate users’ query reformulation behavior with regard to wholistic-analytic cognitive styles, search task type, and experience variables in using the Web. Method: This study is an applied research using survey method. A total of 321 search queries were submitted by 44 users. Data collection tools were Riding’s Cognitive Style A...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملA New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006